Corpus-Oriented Development of Japanese HPSG Parsers

نویسنده

Kazuhiro Yoshida

چکیده

This paper reports the corpus-oriented development of a wide-coverage Japanese HPSG parser. We first created an HPSG treebank from the EDR corpus by using heuristic conversion rules, and then extracted lexical entries from the treebank. The grammar developed using this method attained wide coverage that could hardly be obtained by conventional manual development. We also trained a statistical parser for the grammar on the treebank, and evaluated the parser in terms of the accuracy of semantic-role identification and dependency analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Agent-based Parallel HPSG Parser for Shared-memory Parallel Machines

We describe an agent-based parallel HPSG parser that operates on shared-memory parallel machines. It efficiently parses real-world corpora by using a wide-coverage HPSG grammar. The efficiency is due to the use of a parallel parsing algorithm and the efficient treatment of feature structures. The parsing algorithm is based on the CKY algorithm, in which resolving constraints between a mother an...

متن کامل

Ambiguous Part-of-Speech Tagging for Improving Accuracy and Domain Portability of Syntactic Parsers

We aim to improve the performance of a syntactic parser that uses a part-of-speech (POS) tagger as a preprocessor. Pipelined parsers consisting of POS taggers and syntactic parsers have several advantages, such as the capability of domain adaptation. However the performance of such systems on raw texts tends to be disappointing as they are affected by the errors of automatic POS tagging. We att...

متن کامل

Efficient HPSG Parsing Algorithm with Array Unification

This paper presents a method for improving parsing performance of parsers for HPSG. The method was obtained by extending Torisawa’s parsing method for HPSG. His parsing method utilizes a CFG compiled from a given HPSG-based grammar, and the parser predicts the possible parse trees with the CFG. Since the amount of unification is reduced because of this prediction, parsing performance is improve...

متن کامل

Constituency Parsing of Bulgarian: Word- vs Class-based Parsing

In this paper, we report the obtained results of two constituency parsers trained with BulTreeBank, an HPSG-based treebank for Bulgarian. To reduce the data sparsity problem, we propose using the Brown word clustering to do an off-line clustering and map the words in the treebank to create a class-based treebank. e observations show that when the classes outnumber the POS tags, the results are...

متن کامل

HPSG-based annotation scheme for corpora development and parsing evaluation

This paper proposes a formal framework for development and exploitation of a corpus, based on the HPSG linguistic theory. The formal representation of the annotation scheme facilitates the annotation process and ensures the quality of the corpus and its usage in different application scenarios. Also, evaluation over HPSG annotation scheme is discussed. The advantages of the approach are present...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Corpus-Oriented Development of Japanese HPSG Parsers

نویسنده

چکیده

منابع مشابه

An Agent-based Parallel HPSG Parser for Shared-memory Parallel Machines

Ambiguous Part-of-Speech Tagging for Improving Accuracy and Domain Portability of Syntactic Parsers

Efficient HPSG Parsing Algorithm with Array Unification

Constituency Parsing of Bulgarian: Word- vs Class-based Parsing

HPSG-based annotation scheme for corpora development and parsing evaluation

عنوان ژورنال:

اشتراک گذاری